D² Pinball Score (d2_pinball_score)#
d2_pinball_score is a regression score that measures the fraction of pinball loss (quantile loss) explained by a model relative to a simple baseline that always predicts the empirical (\alpha)-quantile of y_true.
Best possible score: 1.0
Baseline score (constant (\alpha)-quantile predictor): 0.0
Can be negative (worse than the baseline)
It’s the quantile-regression analogue of (R^2): “how much better than the best constant quantile prediction?”.
Learning goals#
Define pinball loss and the D² pinball score (with math)
Build intuition for the asymmetric penalty and the quantile baseline
Implement
d2_pinball_scorefrom scratch in NumPy (weights + multioutput)Fit a simple linear quantile regression model with subgradient descent
Know pros/cons and when to use D² pinball
Quick import#
from sklearn.metrics import d2_pinball_score
Prerequisites#
quantiles / percentiles
basic regression notation
piecewise functions
import warnings
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.linear_model import LinearRegression, QuantileRegressor
from sklearn.metrics import d2_pinball_score, mean_pinball_loss
from sklearn.model_selection import train_test_split
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
1) Pinball loss (quantile loss)#
For a quantile level (\alpha \in [0, 1]), define the pinball loss for one sample as:
Let (u = y - \hat{y}) (true minus prediction). Then
Equivalently (piecewise):
The mean pinball loss over (n) samples is:
Interpretation:
(\alpha) controls asymmetry: for (\alpha=0.9), under-prediction is penalized 9x as much as over-prediction.
(\alpha=0.5) gives (\rho_{0.5}(y,\hat{y}) = \tfrac12|y-\hat{y}|), so
d2_pinball_score(alpha=0.5)matches the MAE-based D² score (d2_absolute_error_score).
def pinball_loss_residual(u, alpha):
# Pinball loss as a function of residual u = y - y_pred
u = np.asarray(u)
return np.where(u >= 0, alpha * u, (alpha - 1) * u)
u = np.linspace(-4, 4, 400)
alphas = [0.1, 0.5, 0.9]
fig = go.Figure()
for a in alphas:
fig.add_trace(
go.Scatter(
x=u,
y=pinball_loss_residual(u, a),
mode="lines",
name=f"alpha={a}",
)
)
fig.add_vline(x=0, line_width=1, line_dash="dash", line_color="gray")
fig.update_layout(
title='Pinball loss is a "tilted" absolute value',
xaxis_title="residual u = y - y_pred",
yaxis_title="rho_alpha(u)",
height=380,
)
fig.show()
2) The baseline and the D² pinball score#
2.1 Best constant predictor = empirical (\alpha)-quantile#
Suppose you restrict yourself to a constant prediction (c) for every sample. You minimize:
A subgradient w.r.t. (c) is:
So at an optimum, roughly an (\alpha) fraction of samples lie below (c). That is exactly the definition of an empirical (\alpha)-quantile.
With sample weights (w_i \ge 0), the condition becomes:
2.2 D² pinball score#
Let (L_\alpha(y, \hat{y})) be the mean pinball loss for your model predictions (\hat{y}). Let (\tilde{y}) be the baseline predictions that always equal the empirical (\alpha)-quantile of (y).
Then:
(D^2=1) for perfect predictions (zero pinball loss)
(D^2=0) for the constant quantile baseline
(D^2<0) if your model is worse than the baseline
Edge cases (matching scikit-learn):
with fewer than 2 samples, the score is undefined (
nan)if the baseline loss is 0 (e.g. constant targets),
sklearnreturns:1.0 for perfect predictions
0.0 otherwise
# The empirical alpha-quantile minimizes pinball loss among constant predictors
n = 250
y = rng.normal(loc=0.0, scale=1.0, size=n)
y[:15] += rng.exponential(scale=4.0, size=15) # right-tail outliers
alpha = 0.9
q = float(np.percentile(y, alpha * 100))
c_grid = np.linspace(y.min() - 1, y.max() + 1, 300)
loss_grid = np.array([mean_pinball_loss(y, np.full_like(y, c), alpha=alpha) for c in c_grid])
c_star = float(c_grid[np.argmin(loss_grid)])
loss_star = float(loss_grid.min())
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
"Mean pinball loss vs constant prediction",
"y distribution",
),
)
fig.add_trace(go.Scatter(x=c_grid, y=loss_grid, mode="lines"), row=1, col=1)
fig.add_vline(x=q, line_dash="dash", line_color="#4C78A8", row=1, col=1)
fig.add_vline(x=c_star, line_dash="dot", line_color="#E45756", row=1, col=1)
fig.add_trace(
go.Scatter(x=[c_star], y=[loss_star], mode="markers", marker=dict(size=9, color="#E45756"), showlegend=False),
row=1,
col=1,
)
fig.update_xaxes(title_text="constant prediction c", row=1, col=1)
fig.update_yaxes(title_text="mean pinball loss", row=1, col=1)
fig.add_trace(go.Histogram(x=y, nbinsx=40, marker_color="rgba(0,0,0,0.65)"), row=1, col=2)
fig.add_vline(x=q, line_dash="dash", line_color="#4C78A8", row=1, col=2)
fig.update_xaxes(title_text="y", row=1, col=2)
fig.update_yaxes(title_text="count", row=1, col=2)
fig.update_layout(
height=360,
title=f"Baseline for D²: empirical {alpha:.1f}-quantile (q={q:.3f})",
showlegend=False,
)
fig.show()
q, c_star
(1.412878104406565, 1.4261650061141289)
3) Interpreting D²#
Think of (L_\alpha(y,\tilde{y})) as the loss of a “no-features” model: it only knows the labels and predicts a constant quantile.
Then:
So if (D^2=0.3), your model reduces pinball loss by 30% compared to the constant-quantile baseline on that dataset.
Like (R^2), D² is mainly for comparing models on the same target/dataset; it’s not a scale-free number you can compare across unrelated problems.
# Tiny examples
y_true = np.array([1, 2, 3])
y_pred = np.array([1, 3, 3])
for a in [0.1, 0.5, 0.9]:
print(f"alpha={a}: D2={d2_pinball_score(y_true, y_pred, alpha=a):.4f}")
alpha = 0.9
q = np.percentile(y_true, alpha * 100)
y_baseline = np.full_like(y_true, q, dtype=float)
print()
print("Baseline D2 (should be 0):", d2_pinball_score(y_true, y_baseline, alpha=alpha))
print("Perfect D2 (should be 1):", d2_pinball_score(y_true, y_true, alpha=alpha))
alpha=0.1: D2=-1.0455
alpha=0.5: D2=0.5000
alpha=0.9: D2=0.7727
Baseline D2 (should be 0): 0.0
Perfect D2 (should be 1): 1.0
# D² as predictions degrade from perfect -> noisy
n = 400
y_true = rng.standard_t(df=3, size=n) # heavy tails
# fixed direction of noise so the curve is easier to read
noise = rng.normal(size=n)
sigmas = np.linspace(0, 2.5, 60)
alphas = [0.1, 0.5, 0.9]
fig = go.Figure()
for a in alphas:
d2_vals = [d2_pinball_score(y_true, y_true + s * noise, alpha=a) for s in sigmas]
fig.add_trace(go.Scatter(x=sigmas, y=d2_vals, mode="lines", name=f"alpha={a}"))
fig.add_hline(y=0, line_dash="dash", line_color="gray")
fig.add_hline(y=1, line_dash="dash", line_color="gray")
fig.update_layout(
title="As predictions get noisier, D² pinball decreases (can go negative)",
xaxis_title="noise scale added to perfect predictions",
yaxis_title="D² pinball score",
height=380,
)
fig.show()
4) From-scratch NumPy implementation#
Below is a small NumPy implementation that mirrors scikit-learn’s behavior:
supports 1D targets (n) and multioutput targets ((n, m))
supports
sample_weightsupports
multioutputaggregation:'raw_values'(per-output scores)'uniform_average'(simple mean)array-like weights of shape ((m,))
uses the same baseline as scikit-learn:
unweighted:
np.percentile(y_true, q=alpha*100, axis=0)weighted: a lower weighted percentile (first value where the weight CDF reaches (\alpha))
Edge cases:
with fewer than 2 samples: returns
nanif the baseline loss is zero (e.g. constant targets):
perfect predictions → 1.0
imperfect predictions → 0.0
def _as_2d(y):
y = np.asarray(y)
if y.ndim == 1:
return y.reshape(-1, 1)
if y.ndim != 2:
raise ValueError(f"y must be 1D or 2D, got shape {y.shape}")
return y
def _weighted_percentile_lower(array, sample_weight, percentile):
# Lower weighted percentile (mirrors sklearn.utils.stats._weighted_percentile)
array = np.asarray(array)
sample_weight = np.asarray(sample_weight)
n_dim = array.ndim
if n_dim == 0:
return array[()]
if array.ndim == 1:
array = array.reshape((-1, 1))
if sample_weight.ndim == 1:
if sample_weight.shape[0] != array.shape[0]:
raise ValueError("sample_weight must have shape (n_samples,) when 1D")
sample_weight = np.tile(sample_weight.reshape(-1, 1), (1, array.shape[1]))
elif sample_weight.shape != array.shape:
raise ValueError("sample_weight must have same shape as array, or shape (n_samples,)")
if np.any(sample_weight < 0):
raise ValueError("sample_weight must be non-negative")
sorted_idx = np.argsort(array, axis=0)
sorted_array = np.take_along_axis(array, sorted_idx, axis=0)
sorted_weights = np.take_along_axis(sample_weight, sorted_idx, axis=0)
weight_cdf = np.cumsum(sorted_weights, axis=0)
total_weight = weight_cdf[-1]
adjusted = percentile / 100.0 * total_weight
# For percentile=0, ignore leading observations with sample_weight=0 (sklearn behavior)
mask = adjusted == 0
if np.any(mask):
adjusted = adjusted.astype(float, copy=False)
adjusted[mask] = np.nextafter(adjusted[mask], adjusted[mask] + 1)
idx = np.array(
[np.searchsorted(weight_cdf[:, i], adjusted[i]) for i in range(weight_cdf.shape[1])],
dtype=int,
)
idx = np.clip(idx, 0, array.shape[0] - 1)
out = sorted_array[idx, np.arange(array.shape[1])]
return out[0] if n_dim == 1 else out
def mean_pinball_loss_numpy(
y_true,
y_pred,
*,
alpha=0.5,
sample_weight=None,
multioutput="uniform_average",
):
# NumPy implementation of mean pinball loss
if not (0.0 <= float(alpha) <= 1.0):
raise ValueError("alpha must be in [0, 1]")
y_true = np.asarray(y_true)
y_pred = np.asarray(y_pred)
if y_true.shape != y_pred.shape:
raise ValueError(f"y_true and y_pred must have the same shape, got {y_true.shape} vs {y_pred.shape}")
Y_true = _as_2d(y_true)
Y_pred = _as_2d(y_pred)
n_samples, n_outputs = Y_true.shape
u = Y_true - Y_pred # residual = y - y_pred
loss = np.where(u >= 0, alpha * u, (alpha - 1) * u) # always >= 0
if sample_weight is None:
out = loss.mean(axis=0)
else:
w = np.asarray(sample_weight).reshape(-1)
if w.shape[0] != n_samples:
raise ValueError(f"sample_weight must have shape (n_samples,), got {w.shape}")
if np.any(w < 0):
raise ValueError("sample_weight must be non-negative")
w_sum = w.sum()
if w_sum == 0:
raise ValueError("sum(sample_weight) must be positive")
out = (loss * w.reshape(-1, 1)).sum(axis=0) / w_sum
if isinstance(multioutput, str):
if multioutput == "raw_values":
return out
if multioutput == "uniform_average":
return float(np.mean(out))
raise ValueError("multioutput must be 'raw_values' or 'uniform_average' or an array-like")
else:
weights = np.asarray(multioutput).reshape(-1)
if weights.shape[0] != n_outputs:
raise ValueError(f"multioutput weights must have shape (n_outputs,), got {weights.shape}")
return float(np.average(out, weights=weights))
def d2_pinball_score_numpy(
y_true,
y_pred,
*,
sample_weight=None,
alpha=0.5,
multioutput="uniform_average",
):
# NumPy implementation of scikit-learn's d2_pinball_score
if not (0.0 <= float(alpha) <= 1.0):
raise ValueError("alpha must be in [0, 1]")
y_true = np.asarray(y_true)
y_pred = np.asarray(y_pred)
if y_true.shape != y_pred.shape:
raise ValueError(f"y_true and y_pred must have the same shape, got {y_true.shape} vs {y_pred.shape}")
Y_true = _as_2d(y_true)
Y_pred = _as_2d(y_pred)
n_samples, n_outputs = Y_true.shape
if n_samples < 2:
warnings.warn("D^2 score is not well-defined with less than two samples.")
return float("nan")
if sample_weight is None:
w = None
else:
w = np.asarray(sample_weight).reshape(-1)
if w.shape[0] != n_samples:
raise ValueError(f"sample_weight must have shape (n_samples,), got {w.shape}")
if np.any(w < 0):
raise ValueError("sample_weight must be non-negative")
if w.sum() == 0:
raise ValueError("sum(sample_weight) must be positive")
numerator = mean_pinball_loss_numpy(
Y_true,
Y_pred,
sample_weight=w,
alpha=alpha,
multioutput="raw_values",
)
if w is None:
yq = np.percentile(Y_true, q=alpha * 100, axis=0)
else:
yq = _weighted_percentile_lower(Y_true, sample_weight=w, percentile=alpha * 100)
y_quantile = np.tile(yq, (n_samples, 1))
denominator = mean_pinball_loss_numpy(
Y_true,
y_quantile,
sample_weight=w,
alpha=alpha,
multioutput="raw_values",
)
nonzero_numerator = numerator != 0
nonzero_denominator = denominator != 0
valid_score = nonzero_numerator & nonzero_denominator
output_scores = np.ones(n_outputs, dtype=float)
output_scores[valid_score] = 1 - numerator[valid_score] / denominator[valid_score]
output_scores[nonzero_numerator & ~nonzero_denominator] = 0.0
if isinstance(multioutput, str):
if multioutput == "raw_values":
return output_scores
if multioutput == "uniform_average":
return float(np.mean(output_scores))
raise ValueError("multioutput must be 'raw_values' or 'uniform_average' or an array-like")
else:
weights = np.asarray(multioutput).reshape(-1)
if weights.shape[0] != n_outputs:
raise ValueError(f"multioutput weights must have shape (n_outputs,), got {weights.shape}")
return float(np.average(output_scores, weights=weights))
# Quick checks vs scikit-learn
alpha = 0.9
# 1D
y_true = rng.normal(size=80)
y_pred = y_true + rng.normal(0, 0.8, size=80)
print("1D")
print("numpy :", d2_pinball_score_numpy(y_true, y_pred, alpha=alpha))
print("sklearn:", d2_pinball_score(y_true, y_pred, alpha=alpha))
# Multioutput + weights
Y_true = rng.normal(size=(120, 3))
Y_pred = Y_true + rng.normal(0, 0.5, size=(120, 3))
w = rng.uniform(0.2, 2.0, size=120)
print()
print("Multioutput (raw)")
print("numpy :", d2_pinball_score_numpy(Y_true, Y_pred, sample_weight=w, alpha=alpha, multioutput="raw_values"))
print("sklearn:", d2_pinball_score(Y_true, Y_pred, sample_weight=w, alpha=alpha, multioutput="raw_values"))
print()
print("Multioutput (uniform_average)")
print("numpy :", d2_pinball_score_numpy(Y_true, Y_pred, sample_weight=w, alpha=alpha, multioutput="uniform_average"))
print("sklearn:", d2_pinball_score(Y_true, Y_pred, sample_weight=w, alpha=alpha, multioutput="uniform_average"))
# Constant-target edge case
y_const = np.ones(10)
print()
print("Constant target:")
print(
"perfect:",
d2_pinball_score_numpy(y_const, y_const, alpha=alpha),
d2_pinball_score(y_const, y_const, alpha=alpha),
)
print(
"imperfect:",
d2_pinball_score_numpy(y_const, np.zeros_like(y_const), alpha=alpha),
d2_pinball_score(y_const, np.zeros_like(y_const), alpha=alpha),
)
1D
numpy : -0.5638604228888011
sklearn: -0.5638604228888011
Multioutput (raw)
numpy : [-0.1124 -0.1877 -0.2139]
sklearn: [-0.1124 -0.1877 -0.2139]
Multioutput (uniform_average)
numpy : -0.17130461752532478
sklearn: -0.17130461752532478
Constant target:
perfect: 1.0 1.0
imperfect: 0.0 0.0
5) Using D² pinball while optimizing a model (from scratch)#
On a fixed dataset (y), the baseline loss (L_\alpha(y, \tilde{y})) depends only on (y) (and (\alpha)), not on the model parameters (\theta).
So:
Maximizing (D^2_\alpha) is equivalent to minimizing the pinball loss:
Their gradients differ only by a constant scale:
So in practice:
train by minimizing pinball loss (a proper training objective)
track D² pinball as a score on train/validation sets
Subgradient for a linear model#
Let (\hat{y}_i = x_i^\top w) and (u_i = y_i - \hat{y}i). For one sample, a valid subgradient of (\rho\alpha(u_i)) w.r.t. (\hat{y}_i) is:
This is enough to do (sub)gradient descent.
# Synthetic data with skewed / heteroscedastic noise
n = 350
x = rng.uniform(-2, 3, size=n)
noise = (0.6 + 0.3 * (x - x.min())) * rng.normal(size=n)
noise += (rng.random(n) < 0.15) * rng.exponential(scale=3.0, size=n) # positive outliers
y = 1.0 + 2.0 * x + noise
X = x.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
alpha = 0.9
def fit_linear_quantile_regression_sgd(X, y, *, alpha, lr=0.05, n_steps=600):
# Fit y ~= b + X w by minimizing mean pinball loss with subgradient descent
X = np.asarray(X)
y = np.asarray(y).reshape(-1)
n = X.shape[0]
X_design = np.column_stack([np.ones(n), X.reshape(n, -1)]) # intercept + features
w = np.zeros(X_design.shape[1])
# Baseline loss for D² tracking (constant alpha-quantile)
baseline_pred = np.full_like(y, np.percentile(y, alpha * 100), dtype=float)
baseline_loss = mean_pinball_loss_numpy(y, baseline_pred, alpha=alpha)
loss_hist = []
d2_hist = []
for _ in range(n_steps):
y_pred = X_design @ w
u = y - y_pred
# subgradient d rho / d y_pred
grad_pred = np.where(u > 0, -alpha, 1 - alpha)
grad_pred[u == 0] = 0.0
grad_w = (X_design.T @ grad_pred) / n
w = w - lr * grad_w
loss = mean_pinball_loss_numpy(y, y_pred, alpha=alpha)
d2 = 1 - loss / baseline_loss if baseline_loss != 0 else (1.0 if loss == 0 else 0.0)
loss_hist.append(loss)
d2_hist.append(d2)
return w, np.array(loss_hist), np.array(d2_hist)
w_qr, loss_hist, d2_hist = fit_linear_quantile_regression_sgd(X_train, y_train, alpha=alpha, lr=0.05, n_steps=700)
w_qr
array([3.0198, 2.2158])
# Optimization diagnostics
iters = np.arange(len(loss_hist))
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=("Train pinball loss vs iteration", "Train D² pinball vs iteration"),
)
fig.add_trace(go.Scatter(x=iters, y=loss_hist, mode="lines"), row=1, col=1)
fig.update_xaxes(title_text="iteration", row=1, col=1)
fig.update_yaxes(title_text="L_alpha", row=1, col=1)
fig.add_trace(
go.Scatter(x=iters, y=d2_hist, mode="lines", line=dict(color="#4C78A8")),
row=1,
col=2,
)
fig.add_hline(y=0, line_dash="dash", line_color="gray", row=1, col=2)
fig.update_xaxes(title_text="iteration", row=1, col=2)
fig.update_yaxes(title_text="D²", row=1, col=2)
fig.update_layout(height=360, title=f"Subgradient descent for alpha={alpha}")
fig.show()
# Compare to OLS (mean regression) on the same alpha
# Design matrices
X_train_design = np.column_stack([np.ones(len(X_train)), X_train])
X_test_design = np.column_stack([np.ones(len(X_test)), X_test])
# Our quantile regressor
y_test_pred_qr = X_test_design @ w_qr
# OLS (targets conditional mean)
ols = LinearRegression().fit(X_train, y_train)
y_test_pred_ols = ols.predict(X_test)
print(f"Test D² pinball (alpha={alpha}):")
print(" quantile GD :", d2_pinball_score(y_test, y_test_pred_qr, alpha=alpha))
print(" OLS (mean) :", d2_pinball_score(y_test, y_test_pred_ols, alpha=alpha))
print()
print("Test mean pinball loss:")
print(" quantile GD :", mean_pinball_loss(y_test, y_test_pred_qr, alpha=alpha))
print(" OLS (mean) :", mean_pinball_loss(y_test, y_test_pred_ols, alpha=alpha))
print()
print("Coverage P(y <= y_pred):")
print(" quantile GD :", float(np.mean(y_test <= y_test_pred_qr)))
print(" OLS (mean) :", float(np.mean(y_test <= y_test_pred_ols)))
# Fit visualization
x_line = np.linspace(x.min(), x.max(), 250)
X_line = x_line.reshape(-1, 1)
X_line_design = np.column_stack([np.ones(len(x_line)), X_line])
y_line_qr = X_line_design @ w_qr
y_line_ols = ols.intercept_ + ols.coef_[0] * x_line
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode="markers", name="data", marker=dict(size=6, opacity=0.55)))
fig.add_trace(go.Scatter(x=x_line, y=y_line_ols, mode="lines", name="OLS (mean)", line=dict(color="gray", dash="dash")))
fig.add_trace(go.Scatter(x=x_line, y=y_line_qr, mode="lines", name=f"Quantile GD (alpha={alpha})", line=dict(color="#4C78A8", width=3)))
fig.update_layout(
title="Linear quantile regression: fitted line vs OLS",
xaxis_title="x",
yaxis_title="y",
height=420,
)
fig.show()
Test D² pinball (alpha=0.9):
quantile GD : 0.26390620369117923
OLS (mean) : -0.1955308060885197
Test mean pinball loss:
quantile GD : 0.35437222526113604
OLS (mean) : 0.5755556075140252
Coverage P(y <= y_pred):
quantile GD : 0.9090909090909091
OLS (mean) : 0.5795454545454546
6) Practical usage (scikit-learn)#
d2_pinball_score is most useful when you care about quantiles rather than the mean:
prediction intervals / uncertainty bands (e.g. 10th and 90th percentile)
asymmetric costs (under-predicting is worse than over-predicting, or vice versa)
risk metrics like Value-at-Risk (VaR)
In scikit-learn you typically pair it with a quantile model such as QuantileRegressor:
from sklearn.linear_model import QuantileRegressor
from sklearn.metrics import d2_pinball_score
alpha = 0.9
model = QuantileRegressor(quantile=alpha, alpha=0.0).fit(X_train, y_train)
y_pred = model.predict(X_test)
d2_pinball_score(y_test, y_pred, alpha=alpha)
For cross-validation you can build a scorer:
from sklearn.metrics import make_scorer
scorer = make_scorer(d2_pinball_score, greater_is_better=True, alpha=0.9)
# scikit-learn QuantileRegressor (solves a convex optimization problem)
alpha_level = 0.9
qr = QuantileRegressor(quantile=alpha_level, alpha=0.0, solver="highs").fit(X_train, y_train)
y_test_pred_qr_sklearn = qr.predict(X_test)
d2 = d2_pinball_score(y_test, y_test_pred_qr_sklearn, alpha=alpha_level)
loss = mean_pinball_loss(y_test, y_test_pred_qr_sklearn, alpha=alpha_level)
(d2, loss, qr.coef_, qr.intercept_)
(0.245096850405492, 0.36342747394411146, array([2.2703]), 3.271546583795644)
Pros, cons, and when to use D² pinball#
Pros#
Quantile-aware: aligns evaluation with quantile regression and asymmetric costs
Baseline-relative: 0 means “no better than predicting the constant (\alpha)-quantile”
Robust-ish: pinball loss grows linearly in the residual magnitude (less outlier-sensitive than squared loss)
Works for intervals: evaluate different quantiles (e.g. 0.1 and 0.9) separately
Cons / pitfalls#
Requires choosing (\alpha): different quantiles answer different questions
Not scale-free across problems: like MAE, the underlying loss depends on the units of (y)
Can be negative: models can be arbitrarily worse than the constant-quantile baseline
Non-smooth objective: training involves subgradients or specialized solvers (LP, coordinate descent, etc.)
Good use cases#
forecasting with prediction intervals (demand, energy, latency, finance)
service-level problems (e.g. “allocate enough capacity so that 90% of days we’re covered”)
risk-sensitive settings (VaR-style quantiles)
Exercises#
Plot mean pinball loss vs constant prediction (c) for several (\alpha) values and verify the minimizer moves from low to high quantiles.
Fit two models for (\alpha=0.1) and (\alpha=0.9); plot the resulting prediction interval and estimate empirical coverage.
Create a dataset where D² pinball is strongly negative and interpret it in terms of numerator vs denominator.
Show numerically that
alpha=0.5makes pinball loss proportional to MAE.
References#
scikit-learn API: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.d2_pinball_score.html
scikit-learn API: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_pinball_loss.html
Koenker & Machado (1999): Goodness of fit for quantile regression (Eq. 7): https://doi.org/10.1080/01621459.1999.10473882
Hastie, Tibshirani, Wainwright (2015): Statistical Learning with Sparsity (pinball deviance discussion): https://hastie.su.domains/StatLearnSparsity/